There are some problems which don't neatly fall into the category of traditional "machine learning" but can still be quite powerful. The problem we will look at today involves relationships between things which can be organized as graphs, and the tool we will use to investigate it is called graph analytics or network analysis.
What is a "Network Graph"
A Network graph is a mathematical (and usually visual) structure designed to show the relations (called edges) between points (called nodes) in an aesthetically-pleasing way. The graph visualizes how subjects are interconnected with each other. Entities are displayed as nodes and the edges connecting them are displayed with lines. Python has a number of modules designed to work with network graphs, and one of the best is called networkx.
There are a variety of interesting analyses that can be performed on a network, but one of the more interesting ones involves looking for "clustering" of subsets of nodes. This clustering is called "community detection".
Zachary's Karate Club is a small example network included with networkx which can be used to test a number of features of graph analytics, including community detection. See this (https://en.wikipedia.org/wiki/Zachary%27s_karate_club) for more details.
From Wikipedia:
A social network of a karate club was studied by Wayne W. Zachary for a period of three years from 1970 to 1972.[2] The network captures 34 members of a karate club, documenting links between pairs of members who interacted outside the club. During the study a conflict arose between the administrator "Officer" and instructor "Mr. Hi" (pseudonyms), which led to the split of the club into two. Half of the members formed a new club around Mr. Hi; members from the other part found a new instructor or gave up karate. Based on collected data Zachary correctly assigned all but one member of the club to the groups they actually joined after the split.
Let's get the Karate Club network, and print out information about the nodes and edges. We will see that the nodes have only one attribute (the club they ended up with) and the edges have no attribute, other than connecting two nodes.
Nodes could have many attributes, such as a label (in this case possibly the name of the student).
Edges could also have attributes, such as the strength of the connection between the two nodes (also called the weight).
import networkx as nx
import community
#
# Get the graph from networkx
gk = nx.karate_club_graph()
print("Node information:")
for node in gk.nodes():
print(gk.nodes[node])
print()
print("Edge information:")
for u,v,data in gk.edges(data=True):
print(u, v, data)
Node information:
{'club': 'Mr. Hi'}
{'club': 'Mr. Hi'}
{'club': 'Mr. Hi'}
{'club': 'Mr. Hi'}
{'club': 'Mr. Hi'}
{'club': 'Mr. Hi'}
{'club': 'Mr. Hi'}
{'club': 'Mr. Hi'}
{'club': 'Mr. Hi'}
{'club': 'Officer'}
{'club': 'Mr. Hi'}
{'club': 'Mr. Hi'}
{'club': 'Mr. Hi'}
{'club': 'Mr. Hi'}
{'club': 'Officer'}
{'club': 'Officer'}
{'club': 'Mr. Hi'}
{'club': 'Mr. Hi'}
{'club': 'Officer'}
{'club': 'Mr. Hi'}
{'club': 'Officer'}
{'club': 'Mr. Hi'}
{'club': 'Officer'}
{'club': 'Officer'}
{'club': 'Officer'}
{'club': 'Officer'}
{'club': 'Officer'}
{'club': 'Officer'}
{'club': 'Officer'}
{'club': 'Officer'}
{'club': 'Officer'}
{'club': 'Officer'}
{'club': 'Officer'}
{'club': 'Officer'}
Edge information:
0 1 {}
0 2 {}
0 3 {}
0 4 {}
0 5 {}
0 6 {}
0 7 {}
0 8 {}
0 10 {}
0 11 {}
0 12 {}
0 13 {}
0 17 {}
0 19 {}
0 21 {}
0 31 {}
1 2 {}
1 3 {}
1 7 {}
1 13 {}
1 17 {}
1 19 {}
1 21 {}
1 30 {}
2 3 {}
2 7 {}
2 8 {}
2 9 {}
2 13 {}
2 27 {}
2 28 {}
2 32 {}
3 7 {}
3 12 {}
3 13 {}
4 6 {}
4 10 {}
5 6 {}
5 10 {}
5 16 {}
6 16 {}
8 30 {}
8 32 {}
8 33 {}
9 33 {}
13 33 {}
14 32 {}
14 33 {}
15 32 {}
15 33 {}
18 32 {}
18 33 {}
19 33 {}
20 32 {}
20 33 {}
22 32 {}
22 33 {}
23 25 {}
23 27 {}
23 29 {}
23 32 {}
23 33 {}
24 25 {}
24 27 {}
24 31 {}
25 31 {}
26 29 {}
26 33 {}
27 33 {}
28 31 {}
28 33 {}
29 32 {}
29 33 {}
30 32 {}
30 33 {}
31 32 {}
31 33 {}
32 33 {}
These will be useful for plotting and layout of our graphs.
import networkx as nx
import community
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from community import community_louvain
import plotly.express as px
import plotly.io as pio
pio.renderers.default='notebook'
import numpy as np
def visualize_3d(X,colors,labels,color_text='Color',label_text='Label',algorithm="tsne",title="Data in 3D"):
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
if algorithm=="tsne":
reducer = TSNE(n_components=3,random_state=47,n_iter=300,early_exaggeration=3.0)
elif algorithm=="pca":
reducer = PCA(n_components=3,random_state=47)
else:
raise ValueError("Unsupported dimensionality reduction algorithm given.")
if X.shape[1]>3:
X = reducer.fit_transform(X)
else:
if type(X)==pd.DataFrame:
X=X.values
colors = pd.Series(colors)
colors = colors.apply(str)
fig = px.scatter_3d(x=X[:,0], y=X[:,1], z=X[:,2],color=colors,labels=labels,
custom_data=[colors,labels],
color_discrete_sequence=px.colors.qualitative.Dark24,
size_max=5.0)
fig.update_traces(marker={'size': 3})
fig.update_traces(marker=dict(size=12,
line=dict(width=2,
color='DarkSlateGrey')),
selector=dict(mode='markers'))
fig.update_traces(hovertemplate=color_text+':%{customdata[0]}<br>'+label_text+':%{customdata[1]}') #
fig.show()
def visualize_2d(X,colors,labels,color_text='Color',label_text='Label',algorithm="tsne",title="Data in 2D"):
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
if algorithm=="tsne":
reducer = TSNE(n_components=2,random_state=47,n_iter=300,early_exaggeration=3.0)
elif algorithm=="pca":
reducer = PCA(n_components=2,random_state=47)
else:
raise ValueError("Unsupported dimensionality reduction algorithm given.")
if X.shape[1]>2:
print("transforming")
X = reducer.fit_transform(X)
else:
if type(X)==pd.DataFrame:
X=X.values
colors = pd.Series(colors)
colors = colors.apply(str)
fig = px.scatter(x=X[:,0], y=X[:,1],color=colors,labels=labels,
custom_data=[colors,labels],
color_discrete_sequence=px.colors.qualitative.Dark24,
size_max=5.0)
fig.update_traces(marker={'size': 3})
fig.update_traces(marker=dict(size=12,
line=dict(width=2,
color='DarkSlateGrey')),
selector=dict(mode='markers'))
fig.update_traces(hovertemplate=color_text+':%{customdata[0]}<br>'+label_text+':%{customdata[1]}') #
fig.show()
def community_layout(g, partition):
"""
Compute the layout for a modular graph.
Arguments:
----------
g -- networkx.Graph or networkx.DiGraph instance
graph to plot
partition -- dict mapping int node -> int community
graph partitions
Returns:
--------
pos -- dict mapping int node -> (float x, float y)
node positions
"""
#
# The scale parameter controls how far apart the clusters are
pos_communities = _position_communities(g, partition, scale=5.)
# The scale parameter controls how far apart the nodes are
pos_nodes = _position_nodes(g, partition, scale=0.4)
# combine positions
pos = dict()
for node in g.nodes():
pos[node] = pos_communities[node] + pos_nodes[node]
return pos
def _position_communities(g, partition, **kwargs):
# create a weighted graph, in which each node corresponds to a community,
# and each edge weight to the number of edges between communities
between_community_edges = _find_between_community_edges(g, partition)
communities = set(partition.values())
hypergraph = nx.DiGraph()
hypergraph.add_nodes_from(communities)
for (ci, cj), edges in between_community_edges.items():
hypergraph.add_edge(ci, cj, weight=len(edges))
# find layout for communities
pos_communities = nx.spring_layout(hypergraph, **kwargs)
# set node positions to position of community
pos = dict()
for node, community in partition.items():
pos[node] = pos_communities[community]
return pos
def _find_between_community_edges(g, partition):
edges = dict()
for (ni, nj) in g.edges():
ci = partition[ni]
cj = partition[nj]
if ci != cj:
try:
edges[(ci, cj)] += [(ni, nj)]
except KeyError:
edges[(ci, cj)] = [(ni, nj)]
return edges
def _position_nodes(g, partition, **kwargs):
"""
Positions nodes within communities.
"""
communities = dict()
for node, community in partition.items():
try:
communities[community] += [node]
except KeyError:
communities[community] = [node]
pos = dict()
for ci, nodes in communities.items():
subgraph = g.subgraph(nodes)
pos_subgraph = nx.spring_layout(subgraph, **kwargs)
pos.update(pos_subgraph)
return pos
def test():
# to install networkx 2.0 compatible version of python-louvain use:
# pip install -U git+https://github.com/taynaud/python-louvain.git@networkx2
g = nx.karate_club_graph()
partition = community_louvain.best_partition(g, resolution=1.5)
print("Number of found communities",len(set(partition.values())))
pos = community_layout(g, partition)
values = [partition.get(node) for node in g.nodes()]
nx.draw(g, pos, node_color=values)
plt.show()
return
Once a network is made, a simple way to look for structure is to see if subsets of the nodes cluster together.
One of the best tools for this is "louvain community detection" (https://perso.uclouvain.be/vincent.blondel/research/louvain.html).
There is a resollution parameter which controls how many clusters are typically found. The default is 1.0. Smaller numbers yield more clusters, while larger numbers yield fewer clusters.
#
# Do community detection
partition_gk = community_louvain.best_partition(gk, resolution=1.5)
print("Number of found communities",len(set(partition_gk.values())))
#
# Layout the network so that the communities are clustered
pos = community_layout(gk, partition_gk)
xpos = []
community = []
label = []
for node in gk.nodes():
community.append(partition_gk.get(node))
xpos.append(pos[node])
label.append(gk.nodes[node]['club'])
xpos = np.asarray(xpos, dtype=np.float32)
#
# Now draw
nx.draw(gk, pos, node_color=community)
plt.show()
#
# Visualize using plotly
visualize_2d(xpos,colors=community,labels=label,color_text='Found Community',label_text='True Cbub',title="Karate Network")
Number of found communities 2

This network is not included with networkx - we have to build it ourselves.
To do this, we will use a file containing a list of comic books and the "heroes" that appear in them. This dataset is from https://www.kaggle.com/csanhueza/the-marvel-universe-social-network.
The data is lines containing "hero" and "comic book".
import pandas as pd
heroes_comics = pd.read_csv('heroes_comics.csv')
print(heroes_comics.head(50))
hero comic 0 24-HOUR MAN/EMMANUEL AA2 35 1 3-D MAN/CHARLES CHAN AVF 4 2 3-D MAN/CHARLES CHAN AVF 5 3 3-D MAN/CHARLES CHAN COC 1 4 3-D MAN/CHARLES CHAN H2 251 5 3-D MAN/CHARLES CHAN H2 252 6 3-D MAN/CHARLES CHAN M/PRM 35 7 3-D MAN/CHARLES CHAN M/PRM 36 8 3-D MAN/CHARLES CHAN M/PRM 37 9 3-D MAN/CHARLES CHAN WI? 9 10 4-D MAN/MERCURIO CA3 36 11 4-D MAN/MERCURIO CM 51 12 4-D MAN/MERCURIO Q 14 13 4-D MAN/MERCURIO Q 16 14 4-D MAN/MERCURIO T 208 15 4-D MAN/MERCURIO T 214 16 4-D MAN/MERCURIO T 215 17 4-D MAN/MERCURIO T 216 18 4-D MAN/MERCURIO T 440 19 8-BALL/ SLEEP 1 20 8-BALL/ SLEEP 19 21 8-BALL/ SLEEP 2 22 ABBOTT, JACK DD/SM 1 23 ABCISSA W2 52 24 ABCISSA W2 53 25 ABEL XFOR 108 26 ABEL XFOR 109 27 ABOMINATION/EMIL BLO ABOM 2 28 ABOMINATION/EMIL BLO ABOM 3 29 ABOMINATION/EMIL BLO ASM 23 30 ABOMINATION/EMIL BLO H 15 31 ABOMINATION/EMIL BLO H 20 32 ABOMINATION/EMIL BLO H2 136 33 ABOMINATION/EMIL BLO H2 137 34 ABOMINATION/EMIL BLO H2 159 35 ABOMINATION/EMIL BLO H2 171 36 ABOMINATION/EMIL BLO H2 194 37 ABOMINATION/EMIL BLO H2 195 38 ABOMINATION/EMIL BLO H2 196 39 ABOMINATION/EMIL BLO H2 270 40 ABOMINATION/EMIL BLO H2 278 41 ABOMINATION/EMIL BLO H2 287 42 ABOMINATION/EMIL BLO H2 288 43 ABOMINATION/EMIL BLO H2 289 44 ABOMINATION/EMIL BLO H2 290 45 ABOMINATION/EMIL BLO H2 364 46 ABOMINATION/EMIL BLO H2 366 47 ABOMINATION/EMIL BLO H2 382 48 ABOMINATION/EMIL BLO H2 383 49 ABOMINATION/EMIL BLO H2 384
To link heroes, we will loop over the above dataframe, and do the following:
Once we have the data for each comic book, we can loop over the comic books and link heroes by the common comic books they appear in.
from collections import defaultdict
from functools import partial
from itertools import repeat
def nested_defaultdict(default_factory, depth=1):
result = partial(defaultdict, default_factory)
for _ in repeat(None, depth - 1):
result = partial(defaultdict, result)
return result()
from itertools import combinations
#
# Loop over dataframe, and for each comic, store all of the heroes that appear in that comic
comicHeroList = defaultdict(list)
heroCount = defaultdict(int)
for index, row in heroes_comics.iterrows():
hero = row['hero']
comic = row['comic']
heroCount[hero] += 1
comicHeroList[comic].append(hero)
#
# Now loop over comics, and count how often heroes show up together
heroPairCount = nested_defaultdict(int,2)
for comic in comicHeroList:
combos = combinations(comicHeroList[comic],2)
for (h1,h2) in combos:
heroPairCount[h1][h2] += 1
heroPairCount[h2][h1] += 1
Now lets print out the most common hero by count, and then for each of these, print out who they appear together with the most often.
for hero in sorted(heroCount, key=heroCount.get, reverse=True)[:10]:
print("Hero: ",hero,"; comic count ",heroCount[hero])
for hero2 in sorted(heroPairCount[hero], key=heroPairCount[hero].get, reverse=True)[:10]:
print(" appears with ",hero2,"; number comics ",heroPairCount[hero][hero2])
Hero: SPIDER-MAN/PETER PARKER ; comic count 1577 appears with WATSON-PARKER, MARY ; number comics 614 appears with JAMESON, J. JONAH ; number comics 514 appears with PARKER, MAY ; number comics 371 appears with ROBERTSON, JOE ; number comics 355 appears with LEEDS, BETTY BRANT ; number comics 246 appears with THOMPSON, EUGENE FLA ; number comics 238 appears with OSBORN, HARRY ; number comics 181 appears with HUMAN TORCH/JOHNNY S ; number comics 147 appears with CAPTAIN AMERICA ; number comics 145 appears with OSBORN, LIZ ALLAN ; number comics 137 Hero: CAPTAIN AMERICA ; comic count 1334 appears with IRON MAN/TONY STARK ; number comics 440 appears with VISION ; number comics 385 appears with THOR/DR. DONALD BLAK ; number comics 380 appears with WASP/JANET VAN DYNE ; number comics 376 appears with SCARLET WITCH/WANDA ; number comics 373 appears with HAWK ; number comics 319 appears with ANT-MAN/DR. HENRY J. ; number comics 289 appears with JARVIS, EDWIN ; number comics 246 appears with WONDER MAN/SIMON WIL ; number comics 215 appears with FALCON/SAM WILSON ; number comics 189 Hero: IRON MAN/TONY STARK ; comic count 1150 appears with CAPTAIN AMERICA ; number comics 440 appears with SCARLET WITCH/WANDA ; number comics 375 appears with THOR/DR. DONALD BLAK ; number comics 339 appears with VISION ; number comics 335 appears with WASP/JANET VAN DYNE ; number comics 296 appears with HAWK ; number comics 294 appears with ANT-MAN/DR. HENRY J. ; number comics 286 appears with WONDER MAN/SIMON WIL ; number comics 249 appears with IRON MAN IV/JAMES R. ; number comics 191 appears with JARVIS, EDWIN ; number comics 185 Hero: THING/BENJAMIN J. GR ; comic count 963 appears with HUMAN TORCH/JOHNNY S ; number comics 724 appears with MR. FANTASTIC/REED R ; number comics 690 appears with INVISIBLE WOMAN/SUE ; number comics 650 appears with RICHARDS, FRANKLIN B ; number comics 198 appears with MASTERS, ALICIA REIS ; number comics 178 appears with CAPTAIN AMERICA ; number comics 170 appears with IRON MAN/TONY STARK ; number comics 130 appears with THOR/DR. DONALD BLAK ; number comics 126 appears with SPIDER-MAN/PETER PARKER ; number comics 125 appears with CRYSTAL [INHUMAN] ; number comics 122 Hero: THOR/DR. DONALD BLAK ; comic count 956 appears with CAPTAIN AMERICA ; number comics 380 appears with IRON MAN/TONY STARK ; number comics 339 appears with ODIN [ASGARDIAN] ; number comics 266 appears with SCARLET WITCH/WANDA ; number comics 256 appears with VISION ; number comics 256 appears with WASP/JANET VAN DYNE ; number comics 233 appears with HAWK ; number comics 212 appears with BALDER [ASGARDIAN] ; number comics 209 appears with SIF ; number comics 204 appears with VOLSTAGG ; number comics 187 Hero: HUMAN TORCH/JOHNNY S ; comic count 886 appears with THING/BENJAMIN J. GR ; number comics 724 appears with MR. FANTASTIC/REED R ; number comics 694 appears with INVISIBLE WOMAN/SUE ; number comics 675 appears with RICHARDS, FRANKLIN B ; number comics 202 appears with CAPTAIN AMERICA ; number comics 162 appears with SPIDER-MAN/PETER PARKER ; number comics 147 appears with MASTERS, ALICIA REIS ; number comics 140 appears with SHE-HULK/JENNIFER WA ; number comics 128 appears with THOR/DR. DONALD BLAK ; number comics 127 appears with CRYSTAL [INHUMAN] ; number comics 119 Hero: MR. FANTASTIC/REED R ; comic count 854 appears with HUMAN TORCH/JOHNNY S ; number comics 694 appears with THING/BENJAMIN J. GR ; number comics 690 appears with INVISIBLE WOMAN/SUE ; number comics 682 appears with RICHARDS, FRANKLIN B ; number comics 228 appears with CAPTAIN AMERICA ; number comics 167 appears with MASTERS, ALICIA REIS ; number comics 143 appears with IRON MAN/TONY STARK ; number comics 130 appears with THOR/DR. DONALD BLAK ; number comics 129 appears with SHE-HULK/JENNIFER WA ; number comics 122 appears with SPIDER-MAN/PETER PARKER ; number comics 120 Hero: HULK/DR. ROBERT BRUC ; comic count 835 appears with BANNER, BETTY ROSS T ; number comics 249 appears with ROSS, GEN. THADDEUS ; number comics 207 appears with JONES, RICHARD MILHO ; number comics 174 appears with DR. STRANGE/STEPHEN ; number comics 155 appears with TALBOT, GLENN ; number comics 120 appears with NORRISS, SISTER BARB ; number comics 118 appears with CAPTAIN AMERICA ; number comics 113 appears with DOC SAMSON/DR. LEONA ; number comics 110 appears with SUB-MARINER/NAMOR MA ; number comics 105 appears with HUMAN TORCH/JOHNNY S ; number comics 104 Hero: WOLVERINE/LOGAN ; comic count 819 appears with STORM/ORORO MUNROE S ; number comics 389 appears with COLOSSUS II/PETER RA ; number comics 331 appears with CYCLOPS/SCOTT SUMMER ; number comics 283 appears with NIGHTCRAWLER/KURT WA ; number comics 268 appears with PROFESSOR X/CHARLES ; number comics 255 appears with ROGUE / ; number comics 231 appears with SHADOWCAT/KATHERINE ; number comics 169 appears with PSYLOCKE/ELISABETH B ; number comics 163 appears with BEAST/HENRY &HANK& P ; number comics 156 appears with MARVEL GIRL/JEAN GRE ; number comics 156 Hero: INVISIBLE WOMAN/SUE ; comic count 762 appears with MR. FANTASTIC/REED R ; number comics 682 appears with HUMAN TORCH/JOHNNY S ; number comics 675 appears with THING/BENJAMIN J. GR ; number comics 650 appears with RICHARDS, FRANKLIN B ; number comics 230 appears with CAPTAIN AMERICA ; number comics 151 appears with MASTERS, ALICIA REIS ; number comics 140 appears with SHE-HULK/JENNIFER WA ; number comics 123 appears with IRON MAN/TONY STARK ; number comics 116 appears with THOR/DR. DONALD BLAK ; number comics 112 appears with SUB-MARINER/NAMOR MA ; number comics 102
The tables above are interesting, but it is a little difficult to tell if all of the heroes are just randomnly connected, or if there is some structure to these connections.
As with the Karate network above, we can learn some interesting thing about the relationships among our heroes if we form them into a network. Our network will have (at least) two primary features:
# Now look for communities
G = nx.Graph()
nodeCountCut = 50.0
edgeCut = 25.0
#
# Now find edges that connect good nodes
numEdges = 0
numGoodEdges = 0
goodEdgeNodes = set()
nodeEdgeCount = defaultdict(int)
for index1 in heroPairCount:
for index2 in heroPairCount[index1]:
if index1 != index2:
numEdges += 1
if heroPairCount[index1][index2] > edgeCut:
numGoodEdges += 1
G.add_edge(index1, index2, weight=heroPairCount[index1][index2])
goodEdgeNodes.add(index1)
goodEdgeNodes.add(index2)
nodeEdgeCount[index1] += 1
nodeEdgeCount[index2] += 1
#
# Next add to graph only those nodes that actually have at least one connection!
numNodes = 0
numGoodNodes = 0
for hero in heroCount:
if hero in goodEdgeNodes:
G.add_node(hero,weight=heroCount[hero],hero=hero)
numNodes += 1
if nodeEdgeCount[hero]>0:
numGoodNodes += 1
print("Total number all nodes ",numNodes)
print("Total number passing cuts nodes ",numGoodNodes)
print("Total number all edges ",numEdges)
print("Total number good edges ",numGoodEdges)
Total number all nodes 478 Total number passing cuts nodes 478 Total number all edges 343288 Total number good edges 4878
#first compute the best partition
# The smaller "resolution" is the more communities you get
resolution = 1.0
partition = community_louvain.best_partition(G,weight='weight', resolution=resolution)
print("Number of found communities",len(set(partition.values())))
Number of found communities 26
The following shows how to connect the found communities to the original list of heroes.
The resulting tables of the largest hero communities and their members are printed, and the results look really sensible!!
communityList = defaultdict(list)
communityCount = defaultdict(int)
communityHeroCount = nested_defaultdict(int,2)
for communityID,hero in zip(partition.values(),partition.keys()):
# print("communityID ",communityID,"; communityIndex ",hero)
communityList[communityID].append(hero)
communityCount[communityID] += 1
communityHeroCount[communityID][hero] = heroCount[hero]
for communityID in sorted(communityCount, key=communityCount.get, reverse=True)[:10]:
print("Community ID ",communityID,"; number of members ",communityCount[communityID])
for hero in sorted(communityHeroCount[communityID], key=communityHeroCount[communityID].get, reverse=True)[:10]:
print(" hero ",hero,"count ",communityHeroCount[communityID][hero])
Community ID 21 ; number of members 90 hero WOLVERINE/LOGAN count 819 hero BEAST/HENRY &HANK& P count 635 hero CYCLOPS/SCOTT SUMMER count 585 hero STORM/ORORO MUNROE S count 523 hero PROFESSOR X/CHARLES count 496 hero MARVEL GIRL/JEAN GRE count 466 hero COLOSSUS II/PETER RA count 452 hero ANGEL/WARREN KENNETH count 444 hero NIGHTCRAWLER/KURT WA count 444 hero ICEMAN/ROBERT BOBBY count 427 Community ID 5 ; number of members 88 hero CAPTAIN AMERICA count 1334 hero IRON MAN/TONY STARK count 1150 hero SCARLET WITCH/WANDA count 643 hero HAWK count 605 hero VISION count 603 hero WASP/JANET VAN DYNE count 581 hero ANT-MAN/DR. HENRY J. count 561 hero FURY, COL. NICHOLAS count 471 hero SHE-HULK/JENNIFER WA count 415 hero JARVIS, EDWIN count 399 Community ID 9 ; number of members 68 hero SPIDER-MAN/PETER PARKER count 1577 hero WATSON-PARKER, MARY count 622 hero DAREDEVIL/MATT MURDO count 619 hero JAMESON, J. JONAH count 577 hero ROBERTSON, JOE count 380 hero PARKER, MAY count 377 hero PUNISHER II/FRANK CA count 299 hero NELSON, FRANKLIN FOG count 278 hero LEEDS, BETTY BRANT count 249 hero KINGPIN/WILSON FISK count 248 Community ID 1 ; number of members 50 hero HULK/DR. ROBERT BRUC count 835 hero DR. STRANGE/STEPHEN count 631 hero SUB-MARINER/NAMOR MA count 530 hero JONES, RICHARD MILHO count 322 hero BANNER, BETTY ROSS T count 253 hero WONG count 221 hero ROSS, GEN. THADDEUS count 213 hero NORRISS, SISTER BARB count 192 hero CLEA count 191 hero DOC SAMSON/DR. LEONA count 129 Community ID 4 ; number of members 37 hero THING/BENJAMIN J. GR count 963 hero HUMAN TORCH/JOHNNY S count 886 hero MR. FANTASTIC/REED R count 854 hero INVISIBLE WOMAN/SUE count 762 hero SILVER SURFER/NORRIN count 310 hero RICHARDS, FRANKLIN B count 270 hero DR. DOOM/VICTOR VON count 270 hero CRYSTAL [INHUMAN] count 262 hero MASTERS, ALICIA REIS count 206 hero MEDUSA/MEDUSALITH AM count 163 Community ID 6 ; number of members 30 hero THOR/DR. DONALD BLAK count 956 hero ODIN [ASGARDIAN] count 321 hero LOKI [ASGARDIAN] count 255 hero BALDER [ASGARDIAN] count 255 hero SIF count 240 hero FANDRAL [ASGARDIAN] count 234 hero HOGUN [ASGARDIAN] count 234 hero VOLSTAGG count 233 hero HEIMDALL [ASGARDIAN] count 145 hero ENCHANTRESS/AMORA/HE count 129 Community ID 2 ; number of members 19 hero NOVA/RICHARD RIDER count 174 hero NAMORITA/NITA PRENTI count 163 hero FIRESTAR/ANGELICA JO count 156 hero JUSTICE II/VANCE AST count 139 hero SPEEDBALL/ROBBIE BAL count 120 hero GEE/ALEX POWER count 111 hero NIGHT THRASHER/DUANE count 91 hero COUNTERWEIGHT II/KAT count 86 hero COUNTERWEIGHT/JACK P count 84 hero LIGHTSPEED/JULIE POW count 84 Community ID 11 ; number of members 17 hero HUDSON, HEATHER count 172 hero SASQUATCH/WALTER LAN count 141 hero PUCK/EUGENE MILTON J count 131 hero SHAMAN/MICHAEL TWOYO count 117 hero NORTHSTAR/JEAN-PAUL count 113 hero AURORA/JEANNE-MARIE count 107 hero BOX IV/MADISON JEFFR count 87 hero GUARDIAN/JAMES MACDO count 70 hero SNOWBIRD/NARYA/ANNE count 60 hero WILD CHILD/KYLE GIBN count 58 Community ID 13 ; number of members 9 hero POWER MAN/ERIK JOSTE count 99 hero BEETLE/ABNER RONALD count 97 hero MOONSTONE II/KARLA S count 92 hero SCREAMING MIMI/MELIS count 78 hero CITIZEN V II/HELMUT count 69 hero FIXER II/PAUL NORBER count 65 hero JOLT/HALLIE TAKAHAMA count 49 hero CHARCOAL/CHARLIE BUR count 38 hero CITIZEN V III/DALLAS count 36 Community ID 16 ; number of members 8 hero JUBILEE/JUBILATION L count 189 hero WHITE QUEEN/EMMA FRO count 135 hero SKIN/ANGELO ESPINOSA count 80 hero CHAMBER/JONOTHON STA count 79 hero HUSK/PAIGE GUTHRIE count 79 hero SYNCH/EVERETT THOMAS count 71 hero PENANCE/MONET ST. CR count 65 hero M count 50
Now lets use the same tool to draw our network.
#
# Do community detection
partition = community_louvain.best_partition(G, resolution=1.5)
print("Number of found communities",len(set(partition.values())))
#
# Layout the network so that the communities are clustered
pos = community_layout(G, partition)
values = [partition.get(node) for node in G.nodes()]
xpos = []
community = []
label = []
for node in G.nodes():
print(G.nodes[node])
community.append(partition.get(node))
xpos.append(pos[node])
label.append(G.nodes[node]['hero'])
xpos = np.asarray(xpos, dtype=np.float32)
#
# Now draw
nx.draw(G, pos, node_color=values)
#nx.draw(G, pos, cmap = plt.get_cmap('jet'), node_color = values, node_size=30, with_labels=False, font_size=6)
plt.show()
visualize_2d(xpos,colors=community,labels=label,color_text='Community',label_text='Hero',title="Karate Network")
Number of found communities 23
{'weight': 561, 'hero': 'ANT-MAN/DR. HENRY J.'}
{'weight': 1334, 'hero': 'CAPTAIN AMERICA'}
{'weight': 605, 'hero': 'HAWK'}
{'weight': 322, 'hero': 'JONES, RICHARD MILHO'}
{'weight': 581, 'hero': 'WASP/JANET VAN DYNE'}
{'weight': 102, 'hero': 'PHARAOH RAMA-TUT'}
{'weight': 1150, 'hero': 'IRON MAN/TONY STARK'}
{'weight': 174, 'hero': 'MOCKINGBIRD/DR. BARB'}
{'weight': 390, 'hero': 'WONDER MAN/SIMON WIL'}
{'weight': 384, 'hero': 'BLACK WIDOW/NATASHA'}
{'weight': 262, 'hero': 'CRYSTAL [INHUMAN]'}
{'weight': 353, 'hero': 'HERCULES [GREEK GOD]'}
{'weight': 415, 'hero': 'SHE-HULK/JENNIFER WA'}
{'weight': 603, 'hero': 'VISION'}
{'weight': 318, 'hero': "BLACK PANTHER/T'CHAL"}
{'weight': 399, 'hero': 'JARVIS, EDWIN'}
{'weight': 631, 'hero': 'DR. STRANGE/STEPHEN'}
{'weight': 835, 'hero': 'HULK/DR. ROBERT BRUC'}
{'weight': 886, 'hero': 'HUMAN TORCH/JOHNNY S'}
{'weight': 854, 'hero': 'MR. FANTASTIC/REED R'}
{'weight': 643, 'hero': 'SCARLET WITCH/WANDA'}
{'weight': 530, 'hero': 'SUB-MARINER/NAMOR MA'}
{'weight': 342, 'hero': 'QUICKSILVER/PIETRO M'}
{'weight': 956, 'hero': 'THOR/DR. DONALD BLAK'}
{'weight': 762, 'hero': 'INVISIBLE WOMAN/SUE'}
{'weight': 176, 'hero': 'QUASAR III/WENDELL V'}
{'weight': 963, 'hero': 'THING/BENJAMIN J. GR'}
{'weight': 101, 'hero': 'SPIDER-WOMAN II/JULI'}
{'weight': 184, 'hero': 'USAGENT/CAPTAIN JOHN'}
{'weight': 635, 'hero': 'BEAST/HENRY &HANK& P'}
{'weight': 154, 'hero': 'MOONDRAGON/HEATHER D'}
{'weight': 169, 'hero': 'TIGRA/GREER NELSON'}
{'weight': 175, 'hero': 'CAPTAIN MARVEL II/MO'}
{'weight': 427, 'hero': 'ICEMAN/ROBERT BOBBY'}
{'weight': 585, 'hero': 'CYCLOPS/SCOTT SUMMER'}
{'weight': 1577, 'hero': 'SPIDER-MAN/PETER PARKER'}
{'weight': 211, 'hero': 'BLACK KNIGHT V/DANE'}
{'weight': 619, 'hero': 'DAREDEVIL/MATT MURDO'}
{'weight': 156, 'hero': 'FIRESTAR/ANGELICA JO'}
{'weight': 172, 'hero': 'BINARY/CAROL DANVERS'}
{'weight': 66, 'hero': 'ULTRON'}
{'weight': 221, 'hero': 'FALCON/SAM WILSON'}
{'weight': 444, 'hero': 'ANGEL/WARREN KENNETH'}
{'weight': 452, 'hero': 'COLOSSUS II/PETER RA'}
{'weight': 96, 'hero': 'DR. DRUID/ANTHONY LU'}
{'weight': 126, 'hero': 'HELLCAT/PATSY WALKER'}
{'weight': 55, 'hero': 'JOCASTA'}
{'weight': 444, 'hero': 'NIGHTCRAWLER/KURT WA'}
{'weight': 61, 'hero': "O'BRIEN, MICHAEL"}
{'weight': 496, 'hero': 'PROFESSOR X/CHARLES'}
{'weight': 96, 'hero': 'REDWING'}
{'weight': 141, 'hero': 'SASQUATCH/WALTER LAN'}
{'weight': 143, 'hero': 'SERSI/SYLVIA'}
{'weight': 523, 'hero': 'STORM/ORORO MUNROE S'}
{'weight': 63, 'hero': 'WHIZZER/ROBERT L. FR'}
{'weight': 819, 'hero': 'WOLVERINE/LOGAN'}
{'weight': 471, 'hero': 'FURY, COL. NICHOLAS'}
{'weight': 119, 'hero': 'CAPTAIN MARVEL/CAPTA'}
{'weight': 577, 'hero': 'JAMESON, J. JONAH'}
{'weight': 270, 'hero': 'RICHARDS, FRANKLIN B'}
{'weight': 141, 'hero': 'THUNDERSTRIKE/ERIC K'}
{'weight': 100, 'hero': 'GYRICH, HENRY PETER'}
{'weight': 92, 'hero': 'MOONSTONE II/KARLA S'}
{'weight': 69, 'hero': 'CITIZEN V II/HELMUT'}
{'weight': 227, 'hero': 'DUGAN, TIMOTHY ALOYI'}
{'weight': 270, 'hero': 'DR. DOOM/VICTOR VON'}
{'weight': 129, 'hero': 'ENCHANTRESS/AMORA/HE'}
{'weight': 135, 'hero': 'GALACTUS/GALAN'}
{'weight': 250, 'hero': 'IRON MAN IV/JAMES R.'}
{'weight': 215, 'hero': 'MAGNETO/MAGNUS/ERIC'}
{'weight': 320, 'hero': 'ROGUE /'}
{'weight': 255, 'hero': 'LOKI [ASGARDIAN]'}
{'weight': 131, 'hero': 'STARFOX/EROS'}
{'weight': 47, 'hero': 'BUCKY/BUCKY BARNES'}
{'weight': 113, 'hero': 'HUMAN TORCH ANDROID/'}
{'weight': 46, 'hero': 'TORO/TOM RAYMOND'}
{'weight': 39, 'hero': 'SPITFIRE/LADY JACQUE'}
{'weight': 136, 'hero': 'CARTER, SHARON'}
{'weight': 163, 'hero': 'NAMORITA/NITA PRENTI'}
{'weight': 62, 'hero': 'LIVING LIGHTNING/MIG'}
{'weight': 206, 'hero': 'MASTERS, ALICIA REIS'}
{'weight': 310, 'hero': 'SILVER SURFER/NORRIN'}
{'weight': 115, 'hero': 'UATU'}
{'weight': 78, 'hero': 'DIAMONDBACK II/RACHE'}
{'weight': 85, 'hero': 'NOMAD III/JACK MONRO'}
{'weight': 76, 'hero': 'ROSENTHAL, BERNIE'}
{'weight': 51, 'hero': 'COBRA/KLAUS VORHEES'}
{'weight': 37, 'hero': 'DEMOLITION MAN/DENNI'}
{'weight': 82, 'hero': 'CARTER, PEGGY'}
{'weight': 115, 'hero': 'JAMESON, COL. JOHN'}
{'weight': 40, 'hero': 'STANKOWICZ, FABIAN'}
{'weight': 130, 'hero': 'RED SKULL/JOHANN SCH'}
{'weight': 466, 'hero': 'MARVEL GIRL/JEAN GRE'}
{'weight': 174, 'hero': 'NOVA/RICHARD RIDER'}
{'weight': 224, 'hero': 'PSYLOCKE/ELISABETH B'}
{'weight': 80, 'hero': 'DRAX/ARTHUR DOUGLAS'}
{'weight': 97, 'hero': 'THANOS'}
{'weight': 139, 'hero': 'JUSTICE II/VANCE AST'}
{'weight': 103, 'hero': 'JONES, GABE'}
{'weight': 58, 'hero': 'RAGE/ELVIN DARYL HAL'}
{'weight': 48, 'hero': 'SWORDSMAN/JACQUES DU'}
{'weight': 38, 'hero': 'TAYLOR, LEILA'}
{'weight': 187, 'hero': 'MOON KNIGHT/MARC SPE'}
{'weight': 39, 'hero': 'FIREBIRD/BONITA JUAR'}
{'weight': 38, 'hero': 'CHARCOAL/CHARLIE BUR'}
{'weight': 49, 'hero': 'JOLT/HALLIE TAKAHAMA'}
{'weight': 99, 'hero': 'POWER MAN/ERIK JOSTE'}
{'weight': 97, 'hero': 'BEETLE/ABNER RONALD'}
{'weight': 78, 'hero': 'SCREAMING MIMI/MELIS'}
{'weight': 253, 'hero': 'BANNER, BETTY ROSS T'}
{'weight': 129, 'hero': 'DOC SAMSON/DR. LEONA'}
{'weight': 213, 'hero': 'ROSS, GEN. THADDEUS'}
{'weight': 68, 'hero': 'LEADER/SAM STERNS'}
{'weight': 74, 'hero': 'JONES, MARLO CHANDLE'}
{'weight': 120, 'hero': 'TALBOT, GLENN'}
{'weight': 49, 'hero': 'CAPTAIN MARVEL III/G'}
{'weight': 35, 'hero': 'TRIATHLON/DELROY GAR'}
{'weight': 83, 'hero': 'GARGOYLE II/ISAAC CH'}
{'weight': 259, 'hero': 'HAVOK/ALEX SUMMERS'}
{'weight': 192, 'hero': 'NORRISS, SISTER BARB'}
{'weight': 329, 'hero': 'SHADOWCAT/KATHERINE'}
{'weight': 197, 'hero': 'GAMBIT/REMY LEBEAU'}
{'weight': 124, 'hero': 'FORGE'}
{'weight': 131, 'hero': 'BISHOP /'}
{'weight': 307, 'hero': 'CANNONBALL II/SAM GU'}
{'weight': 189, 'hero': 'JUBILEE/JUBILATION L'}
{'weight': 223, 'hero': 'BANSHEE/SEAN CASSIDY'}
{'weight': 188, 'hero': 'BOOMER/TABITHA SMITH'}
{'weight': 258, 'hero': 'SUMMERS, NATHAN CHRI'}
{'weight': 212, 'hero': 'SUNSPOT/ROBERTO DACO'}
{'weight': 84, 'hero': 'APOCALYPSE/EN SABAH'}
{'weight': 71, 'hero': 'CALIBAN/'}
{'weight': 179, 'hero': 'MACTAGGERT, MOIRA KI'}
{'weight': 177, 'hero': 'POLARIS/LORNA DANE'}
{'weight': 57, 'hero': 'FIREFIST/RUSTY COLLI'}
{'weight': 36, 'hero': 'HODGE, CAMERON'}
{'weight': 65, 'hero': 'MADDICKS, ARTHUR ART'}
{'weight': 85, 'hero': 'RICTOR/JULIO ESTEBAN'}
{'weight': 225, 'hero': 'WOLFSBANE/RAHNE SINC'}
{'weight': 48, 'hero': 'SOUTHERN, CANDY'}
{'weight': 160, 'hero': 'GHOST RIDER II/JOHNN'}
{'weight': 64, 'hero': 'STRONG GUY/GUIDO CAR'}
{'weight': 70, 'hero': 'TILBY, TRISH/PATRICI'}
{'weight': 108, 'hero': 'ANT-MAN II/SCOTT HAR'}
{'weight': 103, 'hero': 'LYJA LAZERFIST [SKRU'}
{'weight': 107, 'hero': 'AURORA/JEANNE-MARIE'}
{'weight': 70, 'hero': 'GUARDIAN/JAMES MACDO'}
{'weight': 113, 'hero': 'NORTHSTAR/JEAN-PAUL'}
{'weight': 117, 'hero': 'SHAMAN/MICHAEL TWOYO'}
{'weight': 60, 'hero': 'SNOWBIRD/NARYA/ANNE'}
{'weight': 172, 'hero': 'HUDSON, HEATHER'}
{'weight': 49, 'hero': 'PERSUASION/KARA KILL'}
{'weight': 131, 'hero': 'PUCK/EUGENE MILTON J'}
{'weight': 57, 'hero': 'TALISMAN II/ELIZABET'}
{'weight': 58, 'hero': 'WILD CHILD/KYLE GIBN'}
{'weight': 87, 'hero': 'BOX IV/MADISON JEFFR'}
{'weight': 35, 'hero': 'BOX/ROGER BOCHS'}
{'weight': 73, 'hero': 'MARROW/SARAH'}
{'weight': 32, 'hero': 'CANTOR, VERA'}
{'weight': 96, 'hero': 'BLOB/FRED J. DUKES'}
{'weight': 58, 'hero': 'SKIDS/SALLY BLEVINS'}
{'weight': 134, 'hero': 'BLACK BOLT/BLACKANTO'}
{'weight': 135, 'hero': 'GORGON [INHUMAN]'}
{'weight': 132, 'hero': 'KARNAK [INHUMAN]'}
{'weight': 107, 'hero': 'LOCKJAW [INHUMAN]'}
{'weight': 163, 'hero': 'MEDUSA/MEDUSALITH AM'}
{'weight': 119, 'hero': 'TRITON'}
{'weight': 57, 'hero': 'MAXIMUS [INHUMAN]'}
{'weight': 48, 'hero': 'LYNNE, MONICA'}
{'weight': 35, 'hero': 'ROSS, EVERETT KENNET'}
{'weight': 54, 'hero': 'PETROVITCH, IVAN'}
{'weight': 278, 'hero': 'NELSON, FRANKLIN FOG'}
{'weight': 37, 'hero': 'BROTHER VOODOO/DANIE'}
{'weight': 28, 'hero': 'DRUMM, JERICHO'}
{'weight': 232, 'hero': 'CAGE, LUKE/CARL LUCA'}
{'weight': 199, 'hero': 'IRON FIST/DANIEL RAN'}
{'weight': 32, 'hero': 'HOGARTH, JERYN'}
{'weight': 121, 'hero': 'KNIGHT, MISTY'}
{'weight': 97, 'hero': 'WING, COLLEEN'}
{'weight': 30, 'hero': 'TEMPLE, CLAIRE'}
{'weight': 279, 'hero': 'CAPTAIN BRITAIN/BRIA'}
{'weight': 139, 'hero': 'MEGGAN'}
{'weight': 134, 'hero': 'PHOENIX III/RACHEL S'}
{'weight': 43, 'hero': 'STUART, DR. ALISTAIR'}
{'weight': 114, 'hero': 'CYPHER/DOUG RAMSEY'}
{'weight': 114, 'hero': 'LOCKHEED'}
{'weight': 43, 'hero': 'WIDGET'}
{'weight': 141, 'hero': 'DAZZLER II/ALLISON B'}
{'weight': 58, 'hero': 'DAYTRIPPER/AMANDA SE'}
{'weight': 126, 'hero': 'MAGIK/ILLYANA RASPUT'}
{'weight': 66, 'hero': 'LONGSHOT'}
{'weight': 114, 'hero': 'MYSTIQUE/RAVEN DARKH'}
{'weight': 79, 'hero': 'SUMMERS, MADELYNE MA'}
{'weight': 172, 'hero': 'MIRAGE II/DANIELLE M'}
{'weight': 94, 'hero': 'LILANDRA NERAMANI [S'}
{'weight': 55, 'hero': 'PHOENIX II'}
{'weight': 52, 'hero': 'CORSAIR'}
{'weight': 50, 'hero': 'CALLISTO'}
{'weight': 52, 'hero': 'WISDOM, PETER'}
{'weight': 46, 'hero': 'LUNA/LUNA MAXIMOFF ['}
{'weight': 39, 'hero': 'MASTERMIND/JASON WYN'}
{'weight': 65, 'hero': 'MULTIPLE MAN/JAMES A'}
{'weight': 30, 'hero': 'GLADIATOR/MELVIN POT'}
{'weight': 248, 'hero': 'KINGPIN/WILSON FISK'}
{'weight': 165, 'hero': 'PAGE, KAREN'}
{'weight': 299, 'hero': 'PUNISHER II/FRANK CA'}
{'weight': 158, 'hero': 'URICH, BEN'}
{'weight': 45, 'hero': 'NELSON, DEBBIE HARRI'}
{'weight': 38, 'hero': 'BLAKE, BECKY'}
{'weight': 51, 'hero': 'GLENN, HEATHER'}
{'weight': 26, 'hero': "O'BREEN, GLORIANNA"}
{'weight': 41, 'hero': 'BULLSEYE II/BENJAMIN'}
{'weight': 46, 'hero': 'DARKSTAR/LAYNIA SERG'}
{'weight': 29, 'hero': 'VANGUARD/NICOLAI KRY'}
{'weight': 74, 'hero': 'QUARTERMAIN, CLAY'}
{'weight': 93, 'hero': 'HELLSTORM/DAIMON HEL'}
{'weight': 221, 'hero': 'WONG'}
{'weight': 30, 'hero': 'BLESSING, MORGANA'}
{'weight': 65, 'hero': 'WOLFE, SARA'}
{'weight': 74, 'hero': 'ANCIENT ONE'}
{'weight': 54, 'hero': 'BARON MORDO/KARL MOR'}
{'weight': 191, 'hero': 'CLEA'}
{'weight': 67, 'hero': 'DORMAMMU'}
{'weight': 28, 'hero': 'CHANG, IMEI'}
{'weight': 60, 'hero': 'RINTRAH'}
{'weight': 33, 'hero': 'UMAR'}
{'weight': 54, 'hero': 'TOPAZ'}
{'weight': 73, 'hero': 'NIGHTMARE/EDVARD HAB'}
{'weight': 129, 'hero': 'NIGHTHAWK II/KYLE RI'}
{'weight': 26, 'hero': 'CLOUD'}
{'weight': 31, 'hero': 'BLOODSTORM | MUTANT'}
{'weight': 29, 'hero': 'BRUTE | MUTANT X-VER'}
{'weight': 29, 'hero': 'ICE-MAN | MUTANT X-V'}
{'weight': 116, 'hero': 'COOPER, DR. VALERIE'}
{'weight': 321, 'hero': 'ODIN [ASGARDIAN]'}
{'weight': 29, 'hero': 'ARES [GREEK GOD]'}
{'weight': 44, 'hero': 'ZEUS'}
{'weight': 45, 'hero': 'ABOMINATION/EMIL BLO'}
{'weight': 34, 'hero': 'WILSON, JIM'}
{'weight': 28, 'hero': 'ROCK/SAMUEL JOHN ROC'}
{'weight': 33, 'hero': 'NORRISS, JACK'}
{'weight': 72, 'hero': 'WIZARD/BENTLEY WITTM'}
{'weight': 81, 'hero': 'HARKNESS, AGATHA'}
{'weight': 74, 'hero': 'NOVA II/FRANKIE RAYE'}
{'weight': 64, 'hero': 'MOLE MAN/HARVEY RUPE'}
{'weight': 81, 'hero': 'MS. MARVEL II/SHARON'}
{'weight': 42, 'hero': 'KRISTOFF/KRISTOFF VE'}
{'weight': 58, 'hero': 'TRAPSTER/PETER PETRU'}
{'weight': 56, 'hero': 'PUPPET MASTER/PHILLI'}
{'weight': 113, 'hero': 'SANDMAN/WILLIAM BAKE'}
{'weight': 71, 'hero': 'WINGFOOT, WYATT'}
{'weight': 55, 'hero': 'IKARIS/IKE HARRIS [E'}
{'weight': 71, 'hero': 'MAKKARI/MIKE KHARY/I'}
{'weight': 55, 'hero': 'THENA'}
{'weight': 28, 'hero': 'DAMIAN, MARGO'}
{'weight': 30, 'hero': 'RICHARDS, NATHANIEL'}
{'weight': 36, 'hero': 'SCARFE, RAFAEL'}
{'weight': 87, 'hero': 'ARBOGAST, BAMBI'}
{'weight': 119, 'hero': 'HOGAN, VIRGINIA PEPP'}
{'weight': 58, 'hero': 'MADAME MASQUE/GIULIE'}
{'weight': 82, 'hero': 'MANDARIN'}
{'weight': 38, 'hero': 'WHIPLASH/MARK SCARLO'}
{'weight': 30, 'hero': 'ERWIN, MORLEY'}
{'weight': 91, 'hero': 'SITWELL, JASPER'}
{'weight': 57, 'hero': 'CABE, BETHANY'}
{'weight': 32, 'hero': 'CENTURY'}
{'weight': 115, 'hero': 'HOGAN, HAROLD J. HAP'}
{'weight': 26, 'hero': 'ZIMMER, ABE'}
{'weight': 54, 'hero': 'MANTIS/? BRANDT'}
{'weight': 30, 'hero': 'RODGERS, MARIANNE'}
{'weight': 33, 'hero': 'ERWIN, CLYTEMNESTRA'}
{'weight': 176, 'hero': 'KA-ZAR/KEVIN PLUNDER'}
{'weight': 113, 'hero': "SHANNA/SHANNA O'HARA"}
{'weight': 163, 'hero': 'ZABU'}
{'weight': 95, 'hero': 'FRENCHIE/JEAN-PAUL D'}
{'weight': 82, 'hero': 'ALRAUNE, MARLENE'}
{'weight': 30, 'hero': 'CRAWLEY, BETRAND'}
{'weight': 36, 'hero': 'RED GUARDIAN III/DR.'}
{'weight': 49, 'hero': 'DIAMOND LIL/LILLIAN'}
{'weight': 127, 'hero': 'THUNDERBIRD II/JAMES'}
{'weight': 36, 'hero': 'BALLANTINE, KAYLA'}
{'weight': 87, 'hero': 'ROM, SPACEKNIGHT'}
{'weight': 72, 'hero': 'STARSHINE II/BRANDY'}
{'weight': 32, 'hero': 'JACKSON, STEVE'}
{'weight': 28, 'hero': 'GOBLYN'}
{'weight': 31, 'hero': 'MANIKIN/DR. WHITMAN'}
{'weight': 33, 'hero': 'PATHWAY/LAURA DEAN'}
{'weight': 55, 'hero': 'TOAD/MORTIMER TOYNBE'}
{'weight': 40, 'hero': 'MASON, LOUISE'}
{'weight': 45, 'hero': 'BANNON, LANCE'}
{'weight': 65, 'hero': 'CUSHING, KATE'}
{'weight': 91, 'hero': 'GREEN GOBLIN/NORMAN'}
{'weight': 249, 'hero': 'LEEDS, BETTY BRANT'}
{'weight': 54, 'hero': 'MERCADO, JOY'}
{'weight': 377, 'hero': 'PARKER, MAY'}
{'weight': 622, 'hero': 'WATSON-PARKER, MARY'}
{'weight': 101, 'hero': 'DR. OCTOPUS/OTTO OCT'}
{'weight': 57, 'hero': 'ELECTRO/MAX DILLON'}
{'weight': 78, 'hero': 'LEEDS, NED'}
{'weight': 56, 'hero': 'VULTURE/ADRIAN TOOME'}
{'weight': 32, 'hero': 'HOBGOBLIN II/RODERIC'}
{'weight': 380, 'hero': 'ROBERTSON, JOE'}
{'weight': 243, 'hero': 'THOMPSON, EUGENE FLA'}
{'weight': 34, 'hero': 'ROBERTSON, MARTHA'}
{'weight': 72, 'hero': 'LIZARD/DR. CURTIS CO'}
{'weight': 58, 'hero': 'THUNDERBALL/DR. ELIO'}
{'weight': 110, 'hero': 'GRANT, GLORIA GLORY'}
{'weight': 28, 'hero': 'WATSON, KRISTY'}
{'weight': 69, 'hero': 'HOBGOBLIN V/JASON PH'}
{'weight': 182, 'hero': 'OSBORN, HARRY'}
{'weight': 43, 'hero': 'WHITMAN, DEBRA'}
{'weight': 28, 'hero': 'IONELLO, JASON'}
{'weight': 145, 'hero': 'OSBORN, LIZ ALLAN'}
{'weight': 131, 'hero': 'WATSON, ANNA'}
{'weight': 104, 'hero': 'BLACK CAT/FELICIA HA'}
{'weight': 42, 'hero': 'ARRANGER/'}
{'weight': 63, 'hero': 'LUBENSKI, NATE'}
{'weight': 167, 'hero': 'SPIDER-MAN CLONE/BEN'}
{'weight': 49, 'hero': 'OSBORN, NORMIE'}
{'weight': 42, 'hero': 'HAMMERHEAD'}
{'weight': 32, 'hero': 'STACY, ARTHUR'}
{'weight': 45, 'hero': 'CHAMELEON/DMITRI SME'}
{'weight': 49, 'hero': 'MYSTERIO/QUENTIN BEC'}
{'weight': 35, 'hero': 'SHA SHAN'}
{'weight': 37, 'hero': 'DEWOLFF, JEAN'}
{'weight': 35, 'hero': 'MUGGINS, MAMIE'}
{'weight': 44, 'hero': 'SCHEMER/RICHARD FISK'}
{'weight': 73, 'hero': 'JAMESON, MARLA MADIS'}
{'weight': 34, 'hero': 'KAFKA, DR. ASHLEY'}
{'weight': 57, 'hero': 'STACY, JILL'}
{'weight': 36, 'hero': 'PUMA/THOMAS FIREHEAR'}
{'weight': 66, 'hero': 'ROBERTSON, RANDY'}
{'weight': 35, 'hero': 'KATZENBERG, NICK'}
{'weight': 33, 'hero': 'STACY, CAPT. GEORGE'}
{'weight': 93, 'hero': 'STACY, GWEN'}
{'weight': 30, 'hero': 'KANE, MARCY/KAINA'}
{'weight': 75, 'hero': 'VENOM/EDDIE BROCK'}
{'weight': 47, 'hero': 'KAINE'}
{'weight': 45, 'hero': 'JACKAL/MILES WARREN'}
{'weight': 27, 'hero': 'CARNAGE/CLETUS KASAD'}
{'weight': 114, 'hero': 'SPIDER-WOMAN/JESSICA'}
{'weight': 40, 'hero': 'MCCABE, LINDSAY'}
{'weight': 52, 'hero': 'STINGRAY/DR. WALTER'}
{'weight': 71, 'hero': 'BLACK KING/SEBASTIAN'}
{'weight': 84, 'hero': 'VASHTI'}
{'weight': 48, 'hero': 'ATTUMA'}
{'weight': 31, 'hero': 'MARRS(-PAYNE), PHOEB'}
{'weight': 32, 'hero': 'NEWELL, DIANE ARLISS'}
{'weight': 37, 'hero': 'MARRINA/MARRINA SMAL'}
{'weight': 31, 'hero': 'KRANG [ATLANTEAN]'}
{'weight': 66, 'hero': 'DORMA [ATLANTEAN]'}
{'weight': 48, 'hero': 'THINKER'}
{'weight': 33, 'hero': 'THUNDRA'}
{'weight': 255, 'hero': 'BALDER [ASGARDIAN]'}
{'weight': 234, 'hero': 'FANDRAL [ASGARDIAN]'}
{'weight': 39, 'hero': 'HILDEGARDE [ASGARDIA'}
{'weight': 234, 'hero': 'HOGUN [ASGARDIAN]'}
{'weight': 103, 'hero': 'KARNILLA [ASGARDIAN]'}
{'weight': 240, 'hero': 'SIF'}
{'weight': 233, 'hero': 'VOLSTAGG'}
{'weight': 145, 'hero': 'HEIMDALL [ASGARDIAN]'}
{'weight': 32, 'hero': 'TANA NILE'}
{'weight': 71, 'hero': 'ABSORBING MAN/CARL C'}
{'weight': 121, 'hero': 'KINCAID, DR. JANE FO'}
{'weight': 44, 'hero': 'BULLDOZER/HENRY CAMP'}
{'weight': 44, 'hero': 'PILEDRIVER II/BRIAN'}
{'weight': 56, 'hero': 'WRECKER III/DIRK GAR'}
{'weight': 38, 'hero': 'DESTROYER III'}
{'weight': 99, 'hero': 'VIZIER'}
{'weight': 45, 'hero': 'ULIK'}
{'weight': 40, 'hero': 'FRIGGA'}
{'weight': 72, 'hero': 'HELA [ASGARDIAN]'}
{'weight': 44, 'hero': 'LORELEI II/MELODI [A'}
{'weight': 97, 'hero': 'HIGH EVOLUTIONARY/HE'}
{'weight': 51, 'hero': 'BETA RAY BILL'}
{'weight': 35, 'hero': 'ANALYZER'}
{'weight': 40, 'hero': 'EXECUTIONER II/SKURG'}
{'weight': 96, 'hero': 'WEREWOLF BY NIGHT/JA'}
{'weight': 32, 'hero': 'COWEN, BUCK'}
{'weight': 38, 'hero': 'RUSSELL, LISSA'}
{'weight': 88, 'hero': 'SABRETOOTH/VICTOR CR'}
{'weight': 38, 'hero': 'TYGER TIGER/JESSAN H'}
{'weight': 86, 'hero': 'DE LA FONTAINE, CONT'}
{'weight': 29, 'hero': 'NEVILLE, KATE'}
{'weight': 38, 'hero': 'PIERCE, ALEXANDER GO'}
{'weight': 29, 'hero': 'MACKENZIE, AL'}
{'weight': 29, 'hero': 'NETWORK NINA'}
{'weight': 32, 'hero': 'ARCANNA/ARCANNA JONE'}
{'weight': 41, 'hero': 'DR. SPECTRUM/JOSEPH'}
{'weight': 44, 'hero': 'HYPERION'}
{'weight': 28, 'hero': 'POWER PRINCESS/ZARDA'}
{'weight': 39, 'hero': 'WHIZZER II/STANLEY S'}
{'weight': 29, 'hero': 'LADY LARK/LINDA LEWI'}
{'weight': 39, 'hero': 'AUSTIN, SUSAN'}
{'weight': 39, 'hero': 'MASTERSON, KEVIN'}
{'weight': 28, 'hero': 'STEELE, MARCY MASTER'}
{'weight': 32, 'hero': 'BLOODAXE/JACKIE LUKU'}
{'weight': 79, 'hero': 'CHAMBER/JONOTHON STA'}
{'weight': 79, 'hero': 'HUSK/PAIGE GUTHRIE'}
{'weight': 80, 'hero': 'SKIN/ANGELO ESPINOSA'}
{'weight': 71, 'hero': 'SYNCH/EVERETT THOMAS'}
{'weight': 135, 'hero': 'WHITE QUEEN/EMMA FRO'}
{'weight': 65, 'hero': 'PENANCE/MONET ST. CR'}
{'weight': 50, 'hero': 'M'}
{'weight': 52, 'hero': 'AVALANCHE/DOMINIC PE'}
{'weight': 42, 'hero': 'DESTINY II/IRENE ADL'}
{'weight': 63, 'hero': 'PYRO/ALLERDYCE JOHNN'}
{'weight': 37, 'hero': 'BEDLAM/JESSE AARONSO'}
{'weight': 87, 'hero': 'DOMINO III/BEATRICE/'}
{'weight': 36, 'hero': 'FERAL/MARIA CALLASAN'}
{'weight': 61, 'hero': 'SHATTERSTAR II/GAVEE'}
{'weight': 105, 'hero': 'SIRYN/THERESA ROURKE'}
{'weight': 114, 'hero': 'WARLOCK III'}
{'weight': 58, 'hero': "KARMA/XI'AN COY MANH"}
{'weight': 71, 'hero': 'MAGMA/AMARA AQUILLA/'}
{'weight': 86, 'hero': 'COUNTERWEIGHT II/KAT'}
{'weight': 84, 'hero': 'COUNTERWEIGHT/JACK P'}
{'weight': 111, 'hero': 'GEE/ALEX POWER'}
{'weight': 84, 'hero': 'LIGHTSPEED/JULIE POW'}
{'weight': 58, 'hero': 'POWER, DR. JIM'}
{'weight': 83, 'hero': 'WARLOCK II/ADAM WARL'}
{'weight': 111, 'hero': 'MEPHISTO'}
{'weight': 29, 'hero': 'SHALLA BAL II'}
{'weight': 35, 'hero': 'JOSEPH'}
{'weight': 45, 'hero': 'TITANIA II/MARY SKEE'}
{'weight': 65, 'hero': 'FIXER II/PAUL NORBER'}
{'weight': 88, 'hero': 'X-MAN/NATHAN GREY'}
{'weight': 91, 'hero': 'NIGHT THRASHER/DUANE'}
{'weight': 120, 'hero': 'SPEEDBALL/ROBBIE BAL'}
{'weight': 46, 'hero': 'SILHOUETTE'}
{'weight': 50, 'hero': 'BLAQUESMITH'}
{'weight': 48, 'hero': 'BRIDGE, GEORGE WASHI'}
{'weight': 45, 'hero': 'MERRYWEATHER, IRENE'}
{'weight': 36, 'hero': 'CITIZEN V III/DALLAS'}
{'weight': 36, 'hero': 'BLACK MAMBA/TANYA SE'}
{'weight': 31, 'hero': 'ASP II/CLEO'}
{'weight': 38, 'hero': 'CROSSBONES/BROCK BIN'}
{'weight': 40, 'hero': 'WOO, JIMMY'}
{'weight': 37, 'hero': "CH'OD"}
{'weight': 37, 'hero': "MAM'SELLE HEPZIBAH"}
{'weight': 37, 'hero': 'RAZA LONGKNIFE'}
{'weight': 30, 'hero': 'KARKAS [DEVIANT]'}
{'weight': 29, 'hero': 'REJECT/RAN-SAK [DEVI'}
{'weight': 42, 'hero': 'GAMORA'}
{'weight': 46, 'hero': 'PIP/PRINCE GOFERN'}
{'weight': 31, 'hero': 'SPEEDBALL II/DARRION'}
{'weight': 48, 'hero': 'TURBO II (A)/MICHIKO'}
{'weight': 91, 'hero': 'MICROCHIP/LINUS LIEB'}
{'weight': 34, 'hero': 'MENTOR/ALARS [ETERNA'}
{'weight': 30, 'hero': 'CHORD, ANDREW'}
{'weight': 66, 'hero': 'DARKHAWK/CHRIS POWEL'}
{'weight': 53, 'hero': 'POWER, MARGARET'}
{'weight': 30, 'hero': 'LEWIS, SHIRLEY WASHI'}
{'weight': 31, 'hero': 'TRAINER, DR. SEWARD'}
{'weight': 32, 'hero': 'BALDWIN, MADELYNE MA'}
{'weight': 30, 'hero': 'EBONY'}
{'weight': 35, 'hero': "S'YM"}
{'weight': 105, 'hero': 'CLOAK/TYRONE JOHNSON'}
{'weight': 113, 'hero': 'DAGGER/TANDY BOWEN'}
{'weight': 31, 'hero': 'MAYHEM/DET. BRIGID O'}
{'weight': 27, 'hero': 'DELGADO, FATHER'}
{'weight': 120, 'hero': 'GHOST RIDER III/DAN'}
{'weight': 41, 'hero': 'BLAZE, ROXANNE SIMPS'}
{'weight': 26, 'hero': 'BRIGHTWIND'}
{'weight': 60, 'hero': 'LEECH'}
{'weight': 151, 'hero': 'DRACULA/VLAD TEPES'}
{'weight': 52, 'hero': 'VAN HELSING, RACHEL'}
{'weight': 42, 'hero': 'BLADE/'}
{'weight': 64, 'hero': 'DRAKE, FRANKLIN'}
{'weight': 40, 'hero': 'HARKER, QUINCY'}
{'weight': 174, 'hero': 'SHANG-CHI'}
{'weight': 88, 'hero': 'WU, LEIKO'}
{'weight': 49, 'hero': 'FU MANCHU'}
{'weight': 85, 'hero': 'RESTON, CLIVE'}
{'weight': 84, 'hero': 'SMITH, SIR DENIS NAY'}
{'weight': 102, 'hero': 'TARR, BLACK JACK'}
{'weight': 71, 'hero': 'HOWARD THE DUCK'}
{'weight': 45, 'hero': 'SWITZLER, BEVERLY'}
The above analysis is interesting, but there are not many problems which we will encounter in a typical analysis environment which are like the above Marvel Universe. It is reasonable to ask if these tools might be applicable in other circumstances.
The idea of community detection seems like one that might have such broader application. Community detction is just a form of clustering, which itself is an example of "unsupervised" learning. Can we use community detection in a real work example? The answer is yes.
To do this we will use "fake" datasets (generated using sklearn tools), as well as some helper functions in the code block below. The helper fucntions will allow us to visualize the generated datasets.
sklearn comes with a powerful tool for generating datasets call "make_classification". The resulting generated datasets can then be used to test other algorithms such as classification and regression. We will use this to test the clustering ability of graph analytics algorithms.
The important parameters for "make_classification" are the following:
from sklearn.datasets import make_classification
import pandas as pd
X,Y = make_classification(n_samples=1000, n_features=10,
n_redundant=1, n_repeated=0, n_classes=4, n_clusters_per_class=1,
class_sep=2.0,
flip_y=0)
Xd = pd.DataFrame(X)
Yd = pd.Series(Y)
print(Xd)
0 1 2 3 4 5 6 \
0 1.359535 0.464795 -0.053798 0.369262 -0.044798 -2.152139 -1.155315
1 0.103109 0.734970 -0.451676 0.182308 -0.310400 2.635645 0.766861
2 1.818586 0.954591 0.158701 -2.564483 -0.254704 -1.580301 -1.151705
3 0.220463 -1.998717 -0.959936 1.760293 0.566426 1.507623 -0.037483
4 2.898046 -1.388416 -1.042000 -1.055744 -0.575806 -0.756992 0.728297
.. ... ... ... ... ... ... ...
995 -1.095808 1.390119 0.082220 -0.949850 0.137168 -1.042716 -0.286036
996 0.516732 -2.935709 -1.188293 -0.531739 0.685456 2.661871 1.210939
997 1.979105 1.337773 0.726912 0.280291 0.398967 -2.290153 0.763525
998 -0.502896 3.773463 0.414188 1.333357 2.780392 -2.864771 0.153753
999 -0.001274 1.065130 1.839870 0.293900 -0.699497 1.393639 0.975563
7 8 9
0 0.004498 0.328634 -0.586413
1 1.748896 0.391823 2.302387
2 1.019220 -1.458051 0.314386
3 0.756195 -0.780980 -1.621580
4 0.512622 0.252081 -2.091399
.. ... ... ...
995 1.331509 -0.893918 1.130944
996 1.078246 0.868829 -2.142451
997 1.515411 -3.213423 0.400201
998 -1.006818 1.437116 3.051575
999 0.148517 -0.394796 2.039189
[1000 rows x 10 columns]
from collections import defaultdict
from functools import partial
from itertools import repeat
def nested_defaultdict(default_factory, depth=1):
result = partial(defaultdict, default_factory)
for _ in repeat(None, depth - 1):
result = partial(defaultdict, result)
return result()
#
# Loop over the classes we generated above and see how many we have in each
classNums = defaultdict(int)
for y in Y:
classNums[y] += 1
print("Class numbers ",classNums)
Class numbers defaultdict(<class 'int'>, {0: 250, 3: 250, 2: 250, 1: 250})
Our data has 10 dimensions (from the 10 features we generated it with). How do we visualize it? We could make a network of this data (and we will do that below) and then do community detection and plot the resulting network as we did above. Instead, we will use a methoid called t-sne, to project our 10 dimensions down to 2 or 3.
From Wikipedia:
t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map. It is based on Stochastic Neighbor Embedding originally developed by Sam Roweis and Geoffrey Hinton,[1] where Laurens van der Maaten proposed the t-distributed variant.[2] It is a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualization in a low-dimensional space of two or three dimensions. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability.
Our "visualize_3d" and "visualize_2d" functions already incorporate t-sne, if the dimension given to them are aove 3 or 2 respectively. We will see that our 4 generated samples appear as 4-separate clusters aftyer the application of t-sne.
visualize_3d(Xd,Yd,Yd)
We need some way to define edges, in oprder to be able to connect our datapoints. To do this, we will use a concept we introduced previously: cosine similarity. Our points each lie in an n-dimensional space (defined by the features). By calculating the cosine between each point, we can define a strength of the connection:
We are assuming that since the data is generated to come from different classes, that the cosine of points ffrom the same classes should be closer than points from different classes.
from numpy import dot
from numpy.linalg import norm
def cosine_similarity(a,b):
cos_sim = dot(a, b)/(norm(a)*norm(b))
return cos_sim
simSame = []
simDifferent = []
labels = []
index = 0
for y in Y:
labels.append(index)
index += 1
edges = nested_defaultdict(float,2)
for x1,y1,index1 in zip(X,Y,labels):
for x2,y2,index2 in zip(X,Y,labels):
if index2>index1:
sim = cosine_similarity(x1,x2)
edges[index1][index2] = sim
edges[index2][index1] = sim
if y1==y2:
simSame.append(sim)
else:
simDifferent.append(sim)
Look at points of the same class, versus points of different classes.
from matplotlib import pyplot
import numpy as np
bins = np.linspace(-1, 1, 100)
pyplot.hist(simSame, bins, alpha=0.5, label='same')
pyplot.hist(simDifferent, bins, alpha=0.5, label='different')
pyplot.legend(loc='upper right')
pyplot.show()
From the curve above, it looks like we can call connected points, those that have cosine>0.5.
With this definition, we can form a graph.
import networkx as nx
import community
# Now look for communities
G = nx.Graph()
nodeCountCut = 5.0
edgeCut = 0.5
#
# Now find edges that connect good nodes
numEdges = 0
numGoodEdges = 0
goodEdgeNodes = set()
nodeEdgeCount = defaultdict(int)
for index1 in edges:
for index2 in edges[index1]:
if index1 != index2:
numEdges += 1
if edges[index1][index2] > edgeCut:
numGoodEdges += 1
G.add_edge(index1, index2, weight=edges[index1][index2])
goodEdgeNodes.add(index1)
goodEdgeNodes.add(index2)
nodeEdgeCount[index1] += 1
nodeEdgeCount[index2] += 1
#
# Next add to graph only those nodes that actually have at least one connection!
numNodes = 0
numGoodNodes = 0
for x,y,index in zip(X,Y,labels):
# print(y,index)
G.add_node(index,weight=nodeEdgeCount[index],trueclass=y,nodenum=index)
numNodes += 1
if nodeEdgeCount[index]>0:
numGoodNodes += 1
print("Total number all nodes ",numNodes)
print("Total number passing cuts nodes ",numGoodNodes)
print("Total number all edges ",numEdges)
print("Total number good edges ",numGoodEdges)
Total number all nodes 1000 Total number passing cuts nodes 1000 Total number all edges 999000 Total number good edges 179344
#first compute the best partition
# The smaller "resolution" is the more communities you get
resolution = 1.0
partition = community.best_partition(G,weight='weight', resolution=resolution)
print("Number of found communities",len(set(partition.values())))
Number of found communities 4
The found communities are the same as the number of classes. Remember: this was totally unsupervised! Now we need to see if the communities actually correspond to the true classes.
This is a little more complicated than the marvel universe. In our case, each point already belongs to a class, and we want to know how often our classes are connected to the same community.
#
# Layout the network so that the communities are clustered
pos = community_layout(G, partition)
xpos = []
community_color = []
label = []
index_list = []
community_by_index = {}
for node in G.nodes():
community_color.append(partition.get(node))
xpos.append(pos[node])
label.append(G.nodes[node]['trueclass'])
index = G.nodes[node]['nodenum']
community_by_index[index]=partition.get(node)
community_by_index_list = []
print("assigning indices")
for index in range(len(Xd)):
if index in community_by_index:
community_by_index_list.append( community_by_index[index])
else:
community_by_index_list.append(-1)
xpos = np.asarray(xpos, dtype=np.float32)
#
# Now draw
nx.draw(G, pos, node_color=community_color)
plt.show()
#
# Visualize using plotly but NOT using network layout - using t-sne
visualize_2d(Xd,colors=community_by_index_list,labels=Yd,color_text='Found Community',label_text='True Class',title="Fake Data")
assigning indices
transforming
Now let's apply this to a data sample we used previously: pulsars. What I want you to do is to
An extra credit portion will deal with mathing found communities to true classes.
The code below sets things up by reading the data in!
import pandas as pd
#
# Read in all of the other digits
fname = 'https://raw.githubusercontent.com/big-data-analytics-physics/data/master/HTRU2/HTRU_2a.csv'
dfAll = pd.read_csv(fname)
print(dfAll.head(5))
#
# The data already has a 0/1 class variable that defines signal (1) and background (0)
#
# The data is already combined but it will be usefull to split it so we can look at
# signal and background separately.
dfA = dfAll[dfAll['class']==1]
dfB = dfAll[dfAll['class']==0]
print("Length of signal sample: ",len(dfA))
print("Length of background sample: ",len(dfB))
#
# Shuffle the data here
from sklearn.utils import shuffle
dfBShuffle = shuffle(dfB)
#
# Uncomment the next line to limit dfB to be the same length as dfA
#dfB_use = dfBShuffle
dfB_use = dfBShuffle.head(len(dfA))
dfCombined = dfB_use
dfCombined = pd.concat([dfCombined, dfA])
dfCombined = shuffle(dfCombined)
print("Size of signal sample ",len(dfA))
print("Size of background sample ",len(dfB_use))
print("Size of combined sample ",len(dfCombined))
from sklearn.utils import shuffle
dfCombinedShuffle = shuffle(dfCombined,random_state=42) # by setting the random state we will get reproducible results
X = dfCombinedShuffle.iloc[:,:8].to_numpy()
Y = dfCombinedShuffle['class'].values
Profile_mean Profile_stdev Profile_skewness Profile_kurtosis DM_mean \
0 140.562500 55.683782 -0.234571 -0.699648 3.199833
1 102.507812 58.882430 0.465318 -0.515088 1.677258
2 103.015625 39.341649 0.323328 1.051164 3.121237
3 136.750000 57.178449 -0.068415 -0.636238 3.642977
4 88.726562 40.672225 0.600866 1.123492 1.178930
DM_stdev DM_skewness DM_kurtosis class
0 19.110426 7.975532 74.242225 0
1 14.860146 10.576487 127.393580 0
2 21.744669 7.735822 63.171909 0
3 20.959280 6.896499 53.593661 0
4 11.468720 14.269573 252.567306 0
Length of signal sample: 1639
Length of background sample: 16259
Size of signal sample 1639
Size of background sample 1639
Size of combined sample 3278
Calculate the cosine similarity between all of the data points in the pulsar dataset. This is done almost exactly like we did above for our fake dataset.
# your code here
from numpy import dot
from numpy.linalg import norm
simSame = []
simDifferent = []
labels = []
index = 0
for y in Y:
labels.append(index)
index += 1
edges = nested_defaultdict(float,2)
for x1,y1,index1 in zip(X,Y,labels):
for x2,y2,index2 in zip(X,Y,labels):
if index2>index1:
sim = cosine_similarity(x1,x2)
edges[index1][index2] = sim
edges[index2][index1] = sim
if y1==y2:
simSame.append(sim)
else:
simDifferent.append(sim)
Plot the cosine similarity of the signal vs background datasets.
# your code here
from matplotlib import pyplot
import numpy as np
bins = np.linspace(0, 1, 100)
pyplot.hist(simSame, bins, alpha=0.5, label='same')
pyplot.hist(simDifferent, bins, alpha=0.5, label='different')
pyplot.legend(loc='upper right')
pyplot.show()
Plot the 3D visualization using TSNE (using visualize_3d). This might take awhile since there are alot of points to plot.
visualize_3d(X,Y,Y)
/fs/ess/PAS2038/PHYSICS5680_OSU/jupyter/lib64/python3.6/site-packages/plotly/express/_core.py:137: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
Run community detection on the pulsar dataset. Use a cut of 0.9 on cosine similarity, and see how many communities you get. Remember that this is an unsupervised task - the community detection algorithm does not know how many classes are in the same.
Make sure you print out how many classes are found.
import networkx as nx
import community
# Now look for communities
G = nx.Graph()
nodeCountCut = 5.0
edgeCut = 0.9
#
# Now find edges that connect good nodes
numEdges = 0
numGoodEdges = 0
goodEdgeNodes = set()
nodeEdgeCount = defaultdict(int)
for index1 in edges:
for index2 in edges[index1]:
if index1 != index2:
numEdges += 1
if edges[index1][index2] > edgeCut:
numGoodEdges += 1
G.add_edge(index1, index2, weight=edges[index1][index2])
goodEdgeNodes.add(index1)
goodEdgeNodes.add(index2)
nodeEdgeCount[index1] += 1
nodeEdgeCount[index2] += 1
#
# Next add to graph only those nodes that actually have at least one connection!
numNodes = 0
numGoodNodes = 0
for x,y,index in zip(X,Y,labels):
# print(y,index)
G.add_node(index,weight=nodeEdgeCount[index],trueclass=y,nodenum=index)
numNodes += 1
if nodeEdgeCount[index]>0:
numGoodNodes += 1
print("Total number all nodes ",numNodes)
print("Total number passing cuts nodes ",numGoodNodes)
print("Total number all edges ",numEdges)
print("Total number good edges ",numGoodEdges)
Total number all nodes 3278 Total number passing cuts nodes 3278 Total number all edges 10742006 Total number good edges 3423314
#first compute the best partition
# The smaller "resolution" is the more communities you get
resolution = 1.0
partition = community.best_partition(G,weight='weight', resolution=resolution)
print("Number of found communities",len(set(partition.values())))
Number of found communities 3
#
# Layout the network so that the communities are clustered
pos = community_layout(G, partition)
xpos = []
community_color = []
label = []
index_list = []
community_by_index = {}
for node in G.nodes():
community_color.append(partition.get(node))
xpos.append(pos[node])
label.append(G.nodes[node]['trueclass'])
index = G.nodes[node]['nodenum']
community_by_index[index]=partition.get(node)
community_by_index_list = []
print("assigning indices")
for index in range(len(X)):
if index in community_by_index:
community_by_index_list.append( community_by_index[index])
else:
community_by_index_list.append(-1)
xpos = np.asarray(xpos, dtype=np.float32)
assigning indices
# xpos = []
# community_color = []
# label = []
# index_list = []
# community_by_index = {}
print(np.shape(xpos))
print(np.shape(community_color))
print(np.shape(label))
print(np.shape(index_list))
print(np.shape(community_by_index))
(3278, 2) (3278,) (3278,) (0,) ()
# Now draw
nx.draw(G, pos, node_color=community_color)
plt.show()
# Visualize using plotly but NOT using network layout - using t-sne
visualize_2d(X,colors=community_by_index_list,labels=Y,color_text='Found Community',label_text='True Class',title="Pulsar Data")
transforming
/fs/ess/PAS2038/PHYSICS5680_OSU/jupyter/lib64/python3.6/site-packages/plotly/express/_core.py:137: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
For the above pulsar dataset, after community detection, determine separately
Use the Karate Network, and figure out how to provide a size to the nodes using a new version of the visualize_2d method.
You will have to modify the 2nd and 3rd blocks to accomplish this task.
import networkx as nx
import community
#
# Get the graph from networkx
gk = nx.karate_club_graph()
for node in gk.nodes():
print(gk.nodes[node],gk.degree(node))
for u,v,data in gk.edges(data=True):
print(u, v, data)
{'club': 'Mr. Hi'} 16
{'club': 'Mr. Hi'} 9
{'club': 'Mr. Hi'} 10
{'club': 'Mr. Hi'} 6
{'club': 'Mr. Hi'} 3
{'club': 'Mr. Hi'} 4
{'club': 'Mr. Hi'} 4
{'club': 'Mr. Hi'} 4
{'club': 'Mr. Hi'} 5
{'club': 'Officer'} 2
{'club': 'Mr. Hi'} 3
{'club': 'Mr. Hi'} 1
{'club': 'Mr. Hi'} 2
{'club': 'Mr. Hi'} 5
{'club': 'Officer'} 2
{'club': 'Officer'} 2
{'club': 'Mr. Hi'} 2
{'club': 'Mr. Hi'} 2
{'club': 'Officer'} 2
{'club': 'Mr. Hi'} 3
{'club': 'Officer'} 2
{'club': 'Mr. Hi'} 2
{'club': 'Officer'} 2
{'club': 'Officer'} 5
{'club': 'Officer'} 3
{'club': 'Officer'} 3
{'club': 'Officer'} 2
{'club': 'Officer'} 4
{'club': 'Officer'} 3
{'club': 'Officer'} 4
{'club': 'Officer'} 4
{'club': 'Officer'} 6
{'club': 'Officer'} 12
{'club': 'Officer'} 17
0 1 {}
0 2 {}
0 3 {}
0 4 {}
0 5 {}
0 6 {}
0 7 {}
0 8 {}
0 10 {}
0 11 {}
0 12 {}
0 13 {}
0 17 {}
0 19 {}
0 21 {}
0 31 {}
1 2 {}
1 3 {}
1 7 {}
1 13 {}
1 17 {}
1 19 {}
1 21 {}
1 30 {}
2 3 {}
2 7 {}
2 8 {}
2 9 {}
2 13 {}
2 27 {}
2 28 {}
2 32 {}
3 7 {}
3 12 {}
3 13 {}
4 6 {}
4 10 {}
5 6 {}
5 10 {}
5 16 {}
6 16 {}
8 30 {}
8 32 {}
8 33 {}
9 33 {}
13 33 {}
14 32 {}
14 33 {}
15 32 {}
15 33 {}
18 32 {}
18 33 {}
19 33 {}
20 32 {}
20 33 {}
22 32 {}
22 33 {}
23 25 {}
23 27 {}
23 29 {}
23 32 {}
23 33 {}
24 25 {}
24 27 {}
24 31 {}
25 31 {}
26 29 {}
26 33 {}
27 33 {}
28 31 {}
28 33 {}
29 32 {}
29 33 {}
30 32 {}
30 33 {}
31 32 {}
31 33 {}
32 33 {}
#
# You will have to modify this!
def visualize_2d_size(X,colors,labels,color_text='Color',label_text='Label',algorithm="tsne",title="Data in 2D"):
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
if algorithm=="tsne":
reducer = TSNE(n_components=2,random_state=47,n_iter=300,early_exaggeration=3.0)
elif algorithm=="pca":
reducer = PCA(n_components=2,random_state=47)
else:
raise ValueError("Unsupported dimensionality reduction algorithm given.")
if X.shape[1]>2:
print("transforming")
X = reducer.fit_transform(X)
else:
if type(X)==pd.DataFrame:
X=X.values
colors = pd.Series(colors)
colors = colors.apply(str)
fig = px.scatter(x=X[:,0], y=X[:,1],color=colors,labels=labels,
custom_data=[colors,labels],
color_discrete_sequence=px.colors.qualitative.Dark24,
size_max=5.0)
#fig.update_traces(marker={'size': 3})
fig.update_traces(marker=dict(line=dict(width=2,
color='DarkSlateGrey')),
selector=dict(mode='markers'))
fig.update_traces(hovertemplate=color_text+':%{customdata[0]}<br>'+label_text+':%{customdata[1]}') #
fig.show()
#
# You will have to modify this!
#
# Do community detection
partition_gk = community_louvain.best_partition(gk, resolution=1.5)
print("Number of found communities",len(set(partition_gk.values())))
#
# Layout the network so that the communities are clustered
pos = community_layout(gk, partition_gk)
xpos = []
community = []
label = []
for node in gk.nodes():
community.append(partition_gk.get(node))
xpos.append(pos[node])
label.append(gk.nodes[node]['club'])
xpos = np.asarray(xpos, dtype=np.float32)
#
# Visualize using plotly
visualize_2d_size(xpos,colors=community,labels=label,color_text='Found Community',label_text='True Cbub',title="Karate Network")
Number of found communities 2